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-sj Abstract. We study the following fundamental realization problem of di- 

rected acyclic graphs (dags). Given a sequence 5* := (j 1 ) , ■ • ■ , (£") with ttj, bi £ 
Zg", does there exist a dag (no parallel arcs allowed) with labeled vertex set 
V :— {vi, . . . , v n } such that for all Vi £ V indegree and outdegree of Vi match 
exactly the given numbers a,i and bi, respectively? Recently this decision prob- 
\Q lem has been shown to be NP-complete by Nichterlein |Nicll| . However, we 

t-H can show that several important classes of sequences are efficiently solvable. 

In previous work |BM11| . we have proved that yes-instances always have a 
special kind of topological order which allows us to reduce the number of pos- 
sible topological orderings in most cases drastically. This leads to an exact 
exponential-time algorithm which significantly improves upon a straightfor- 
CZ3 ward approach. Moreover, a combination of this exponential-time algorithm 

O with a special strategy gives a linear-time algorithm. Interestingly, in sys- 

tematic experiments we observed that we could solve a huge majority of all 
__ instances by the linear-time heuristic. This motivates us to develop charac- 

^ teristics like dag density and "distance to provably easy sequences" which can 

\Q give us an indicator how easy or difficult a given sequence can be realized. 

C*") Furthermore, we propose a randomized algorithm which exploits our struc- 

^O tural insight on topological sortings and uses a number of reduction rules. We 

compare this algorithm with other straightforward randomized algorithms in 
£f} extensive experiments. We observe that it clearly outperforms all other vari- 

(""*) ants and behaves surprisingly well for almost all instances. Another striking 

O^l observation is that our simple linear-time algorithm solves a set of real-world 

t~H instances from different domains, namely ordered binary decision diagrams 

►> (OBDDs), train and flight schedules, as well as instances derived from food- 

web networks without any exception. 



1 The Dag Realization Problem 

Dag realization problem: Given is a finite sequence S :— (f 1 ) , . . . , (? n ) with 
cii,bi £ Zg. Does there exist an acyclic digraph (without parallel arcs) G — (V,A) 
with the labeled vertex set V := {v±, . . . , v n } such that we have indegree d,Q(vi) = a% 
and outdegree dj(ui) = bi for all V{ G VI 

If the answer is "y es "j w e call sequence S dag sequence and the acyclic digraph G 
(a so-called "dag") a dag realization. A relaxation of this problem - not demanding 
the acyclicity of digraph G - is called digraph realization problem. In this case, we call 
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G digraph realization and S digraph sequence. The digraph realization problem can 
be solved in linear-time using an algorithm by Wang and Kleitman |KW73| . Unless 
explicitly stated, we assume that a sequence does not contain any zero tuples („) . 
Moreover, we will tacitly assume that ^2 i=1 a-i — Si=i ^ as this ^ s obviously a nec- 
essary condition for any realization to exist, since the number of ingoing arcs must 
equal the number of outgoing arcs. Furthermore, we denote tuples (?*] with at > 
and bi — as sink tuples, those with a, = and bi > as source tuples, and the 
remaining ones with a,; > and b t > as stream tuples. We call a sequence only con- 
sisting of source and sink tuples, source-sink-sequence. A sequence S — (j 1 ), ■ ■ • , (? n ) 
with g source tuples and s sink tuples is denoted as canonically sorted, if and only if 
the first q tuples in this labeling are decreasingly sorted source tuples (with respect to 
the bi) and the last s tuples are increasingly sorted sink tuples (with respect to the ai). 

Hardness and efficiently solvable special cases. Nichterlein very recently showed 
that the dag realization problem is NP-complete [Nicll . On the other hand, there are 
several classes of sequences for which the problem is not hard. One of these sequences 
are source-sink-sequences, for which one only has to find a digraph realization. The 
latter is already a dag realization, since no vertex has incoming as well as outgo- 
ing arcs. Furthermore, sparse sequences with 53j=i a-i < n — 1 are polynomial-time 
solvable as we will show below. We denote such sequences by forest sequences. The 
main difficulty for the dag realization problem is to find out a "topological ordering 
of the sequence". In the case where we have one, our problem is nothing else but a 
directed /-factor problem on a complete dag. The labeled vertices of this complete 
dag are ordered in the given topological order. This problem can be reduced to a 
bipartite undirected /-factor problem which can be solved in polynomial time via a 
further famous reduction by Tutte |Tut52| to a bipartite perfect matching problem. 
In a previous paper [BM.lI], we proved that a certain ordering of a special class of 
sequences -opposed sequences- always leads to a topological ordering of the tuples 
for at least one dag realization of a given dag sequence. On the other hand, it is not 
necessary to apply the reduction via Tutte if we possess one possible topological or- 
dering of a dag sequence. The solution is much easier. Next, we describe our approach. 

Realization with a prescribed topological order. We denote a dag sequence 
S := (t 1 ), . . . , (£") which possesses a dag realization with a topological numbering 
corresponding to the increasing numbering of its tuples by dag sequence for a given 
topological order and analogously the digraph G — (V, A) by dag realization for a given 
topological order. Without loss of generality, we may assume that the source tuples 
come first in the prescribed numbering and are ordered decreasingly with respect 
to their bi values. A realization algorithm works as follows. Consider the first tuple 
( b q+1 ) from the prescribed topological order which is not a source tuple. Then there 
must exist a q+ \ source tuples with a smaller number in the given dag sequence. 
Reduce the a q +\ first (i.e. with largest bi) source tuples by one and set the indegree 
of tuple ( a b q+1 ) to 0. That means, we reduce sequence S := (I 1 ),. . . , ( a b q+1 ),- ■ ■ , (£") 

to sequence's" := { b ^A , . . . , ( ^J ,..., (£) , ( 6 ° ) , . . . , (£) . If we gel zero tuples 

in S' , then we delete them and denote the new sequence for simplicity also by S' . 
Furthermore, we label this sequence with a new numbering starting from one to its 
length and consider this sorting as the given topological ordering for S'. We repeat 
this process until we get an empty sequence (corresponding to the realizability of 
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S) or get stuck (corresponding to the non-realizability of S) . The correctness of our 
algorithm is proven in Lemma [T] 

Lemma 1. S is a dag sequence for a given topological order O- S' is a dag sequence 
for its corresponding topological order. 

Discussion of our main theorem and its corresponding algorithm. We do not 

know how to determine a feasible topological ordering (i.e., one corresponding to a 
realization) for an arbitrary dag sequence. However, we are able to restrict the types 
of possible permutations of the tuples. For that, we need the following order relation 
< opp C Z 2 x Z 2 , introduced in |BM11| . 

Definition 1 (opposed relation). Given are c\ :— (fM G 1? and C2 :— (^ 2 ) G 1? ■ 
We define: C\ <opp C2 & (oi < a 2 A b\ > 62). 

Note, that a pair c\ equals ci with respect to the opposed relation if and only if 
a\ = ai and b\ =62- The opposed relation is reflexive, transitive and antisymmetric 
and therefore a partial, but not a total order. Our following theorem leads to a recur- 
sive algorithm with exponential running time and results in Corollary [T] which proves 
the existence of a special type of possible topological sortings provided that sequence 
S is a dag sequence. 

Theorem 1 (main theorem [B Mllj ). Let S be a canonically sorted sequence con- 
taining k > source tuples. Furthermore, we assume that S is not a source-sink- 
sequence. We define the set 

_ J f a i\ I f a i\ is stream tuple, a% < k, f a j\ f " 1 

Vmin ■— I y b J I y b J and there ig nQ gtream tuple y b J <opp ^ 

S is a dag sequence if and only if V m in 7^ and there exists an element (?*) G V m 
such that S' : = 

bi-lj \b ae -lj \b ai+ ij \b k J \be-ij \bej \bt+ij \b r 

is a dag sequence. 

Sequence S' may contain zero tuples. If this is the case, we delete them and call 
the new sequence for simplicity also S' . Theorem IT] ensures the possibility for reducing 
a dag sequence into a source-sink-sequence. The latter can be realized by using the 
algorithm for realizing digraph sequences [KW73J . The whole algorithm is summarized 
in Algorithm n\ where we consider the maximum subset V^ in of V m i n only containing 
pairwise disjoint stream tuples. The bottleneck of this approach is the size of set T^m- 
We give an example for the execution of Algorithm[T]in the Appendix. Our pseudocode 
does not specify the order in which we process the elements of V^^ in line 3. Several 
strategies are possible which have a significant influence on the overall performance. 
The most promising deterministic strategy (as we will learn in the next sections) is to 
use the lexicographic order, starting with the lexicographic maximum element within 
Vmin- I n [BMlT] we introduced a special class of dag sequences - opposed sequences 
- where we have |V^j„| = 1, if sequence S is not a source-sink-sequence. We call a 
sequence S opposed sequence, if it is possible to sort its stream tuples in such a way, 
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Algorithm 1: DagRealization(sequence S) 



Input : A canonically sorted sequence S. 
Output: A Boolean flag indicating whether S is realizable. 
l if S b not a source-sink-sequence then 

2 
3 



count the number of sources in S and determine set V^, 
for all ( a b j ) £ V^ in do 

create a working copy S' of S with tuples (",*) = (£* 

set bj -s — &^ — 1 for a'j largest sources ( 6 ,); 



set a'j <— 0; 
delete ( ) -tuples; 



if DagRealization(S' ) then return TRUE; 

return FALSE; 

10 else // Realization of a source-sink-sequence 

while the set of source tuples in S is not empty do 
choose a largest source tuple ( b ); 

if number of sinks in S is smaller than bj then return FALSE; 
set at <— Oj — 1 for bj largest sinks (q); 
delete ( ) -tuples; 

return TRUE; 



ll 

12 

13 

14 
15 



that a, < a,; + i and bi > &j_|_i is valid for stream tuples with indices i and i + 1. In this 
case, we have the property (?*) < opp (& i+1 ) f° r an stream tuples. At the beginning of 
the sequence we insert all source tuples such that the bi build a decreasing sequence 
and at the end of sequence S we put all sink tuples in increasing ordering with respect 
to the corresponding e^. The notion opposed sequence describes a sequence, where it 
is possible to compare all stream tuples among each other and to put them in a 
"chain". Indeed, this is not always possible because the opposed order is not a total 
order. However, for opposed sequences line (3) to line (9) in Algorithm U\ are executed 
at most once in each recursive call, because we have always |V^ ir J < 1. Overall, 
we obtain a linear-time algorithm for opposed sequences. However, there are many 
sequences which are not opposed, but Theorem [T] still yields a polynomial decision 
time. Consider for example dag sequence S :— m, u), (ij, u), (J), (q), (q) which is 
not an opposed sequence, because stream tuples ( 2 ) and ( 3 ) are not comparable with 
respect to the opposed ordering. However, we have |V^, in | = |{( 2 )}| = 1 an d so we 
reduce S to S' = ( 2 ) , ( 2 ) , ( 2 ) , ( 3 ) , ( ) , ( ) , ( ) , leading to the realizable source-sink- 
sequence (°), (°), (°), (g), (J), Q, (q). Theorem [ll leads to further interesting insights. 
We can prove the existence of special topological sortings. 

Corollary 1 QBMllJ). For every dag sequence S, there exists a dag realization 
G = (V, A) with a topological ordering v^ , . . . , vi n of all n s vertices corresponding to 
stream tuples, such that we cannot find \!' j ) <opp (V 1 ) f or h < h- 

We call a topological ordering of a dag sequence obeying the conditions in Corol- 
lary [ll an opposed topological sorting. At the beginning of our work (when the com- 
plexity of the dag realization problem was still open) , we conjectured that the choice 
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of the lexicographical largest tuple from P^„ in line (3) would solve our problem in 
polynomial time. We call this approach lexmax strategy and a dag sequence which 
is realizable with this strategy lexmax sequence, otherwise we call it non-lexmax se- 
quence. Hence, we conjectured the following. 

Conjecture 1 (lexmax conjecture). Each dag sequence is a lexmax sequence. 

We soon disproved our own conjecture by a counter-example (Example [l] described 
in the following section and in Appendix [B]). In systematic experiments we found out 
that a large fraction of sequences can be solved by this strategy in polynomial time. 
We tell this story in the next Section [2j Moreover, we use the structural insights from 
our main theorem to develop a randomized algorithm which performs well in practice 
(Section pi). Proofs and further supporting material can be found in the Appendix 
and in [Berll . 

2 Lessons from Experiments with the Lexmax Strategy 

Why we became curious. To see whether our lexmax Conjecture [T] might be true, 
we generated a set of dag sequences, called randomly generated sequences in the sequel, 
by the following principle: Starting with a complete acyclic digraph, delete k of its 
arcs uniformly at random. We take the degree sequence from the resulting graph. 
Note that we only sample uniformly with respect to random dags but not uniformly 
degree sequences since degree sequences have different numbers of corresponding dag 
realizations. In a first experiment we created with the described process one million 
dag sequences with 20 tuples each, and m = J2i=i a * = H4- Likewise, we built up 
another million dag sequences with 25 tuples and J2i=i a i = 180. The fact that the 
lexmax strategy realized all these test instances without a single failure was quite 
encouraging. The lexmax conjecture [l] seemed to be true, only a correctness proof 
was missing. But quite soon, in an attempt to prove the conjecture, we artificially 
constructed a first counter-example, a dag sequence which is definitely no lexmax 
sequence, as can easily be verified: 

Example 1. S := (°), (°) , Q , (3) , (4) , ([) , (J) , Q , (jj) . Details are shown in Appendix|B 
The rightmost path in the recursion tree shown in FigurefTTlcorresponds to the lexmax 
strategy, but is unsuccessful. 

Even worse: we also found an example (Example p| showing that no fixed strat- 
egy which chooses an element from V^ n in Algorithm [I] and does not consider the 
corresponding set of sinks, will fail in general. 

Example 2. We consider the two sequences 

'0\ /0\ /0\ /0\ /0\ /5\ /5\ /2\ /2\ /1\ /1\ /2\ /6\ /9 



'''''' \->J Vr,y \r K r VlJ'VU' \r,y \:,J- \->l \2)' Vn7" UiJ ' \nj ' \i)J " \() 



and 

S 2 := 



%mmmmaai 



only differing in their sink tuples. Sequence £2 can only be realized by the lexmax 
strategy, while several strategies but not the lexmax strategy work for Si. Thus, there 
is no strategy which can be applied in both cases. 
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Fig. 1. Percentage of (non-trivial) lex- 
max sequences for systematically gen- 
erated (blue squares) and randomly 
generated sequences (red triangles) 
with 9 tuples and m £ {5, . . . , 35}. 
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Fig. 2. Fraction of systematic non-lexmax 
sequences with 9 tuples, to £ {9, . . . , 35}, 
and varying difference to opposed d(S). 



These observations give rise to several immediate questions: Why did we construct 
by our sampling method (for n = 20 and n = 25) only dag sequences which are 
lexmax sequences? How many dag sequences are not lexmax sequences? Therefore, we 
started with systematic experiments. For small instances with n £ {7,8,9} tuples we 
generated systematically the set of all dag sequences with all possible Y^hLi a i ~ : m i 
see for an example the case n — 9 in Figure [T] and Appendix [C] More precisely, 
we considered only non-trivial sequences, i.e. we eliminated all source-sink sequences 
and all sequences with only one stream tuple. We denote this set by systematically 
generated sequences. Note that the number of sequences grows so fast in n that a 
systematic construction of all sequences with a larger size is impossible. We observed 
the following: 

1. The fraction of lexmax sequences among the systematically generated sequences 
is quite high. For all to it is above 96.5%, see Figure fl] (blue squares). 

2. The fraction of lexmax sequences strongly depends on to. It is largest for sparse 
and dense dags. 

3. Lexmax sequences are overrepresented among one million randomly generated 
sequences (for each m), we observe more than 99% for all densities of dags, see 
Figure [T] (red triangles). 

This leads to the following questions: Given a sequence for which we seek a dag 
realization. How should we proceed in practice? As we have seen, the huge majority 
of dag sequences are lexmax sequences. Is it possible to find characteristic properties 
for lexmax sequences or non-lexmax sequences, respectively? 

Distance to opposed sequences. Let us exploit our characterization that opposed 
sequences are efficiently solvable. We propose the distance to opposed d(S) for each 
dag sequence S. Consider for that the topological order of a dag realization G given 
by Algorithm fl] if in line (3) elements are chosen in decreasing lexicographical order. 
This ordering corresponds to exactly one path of the recursion tree. Thus, we obtain 
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Fig. 3. Percentage of systematically generated sequences S (left) and randomized 
generated sequences (right) with their difference d(S) to opposed for n = 9 tuples 
and m € {9, . . . , 35}. 



one unique dag realization G for S, if existing. Now, we renumber dag sequence S 
such that it follows the topological order induced by the execution by this algorithm, 
i.e. by the sequence of choices of elements from V^ in . Then the distance to opposed 
is defined as the number of pairwise incomparable stream tuples with respect to this 
order, more precisely, 



d(S) 



incomparable stream tuples 
w.r.t. < opp and i < j 



Question 1: Do randomly generated sequences possess a preference to a "small" dis- 
tance to opposed in comparison with systematically generated sequences? In Figure [3] 
(left), we show the distribution of systematically generated sequences (in %) with 
their distance to opposed, depending on m := X)i=i a i- We compare this scenario 
with the same setting for randomly generated sequences, shown in Figure [3] (right). 

Observations: Systematically generated sequences have a slightly larger range of 
the "distance to opposed" than randomly generated sequences. Moreover, when we 
generate dag sequences systematically, we obtain a significantly larger fraction of 
instances with a larger distance to opposed than for randomly generated sequences, 
and this phenomenon can be observed for all m. 

Question 2: Do non-lexmax sequences possess a preference for large opposed distances? 
Since opposed sequences are easily solvable |BM11| , we conjecture that sequences with 
a small distance to opposed might be easier solvable by the lexmax strategy than those 
with a large distance to opposed. If this conjecture were true, it would give us together 
with our findings from Question 1 one possible explanation for the observation that the 
randomly generated sequences have a larger fraction of efficiently solvable sequences 
by the lexmax strategy. 

Observations: A separate analysis of non-lexmax sequences (that is, the subset of 
unsolved instances by the lexmax strategy), displayed in Figure^] gives a clear picture: 
yes! For systematically generated sequences with n = 9, we observe in particular for 
instances with a middle density that the fraction of non-lexmax sequences becomes 
maximal for a relatively large distance to opposed. 
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name and 








dag 


norm. dist. 


kind of network 


n 


m 


b 


density p 


to opposed 


burgess shale (b) 


142 


770 


101 


0.08 


0.40 


chengjiang shale (b) 


85 


559 


54 


0.16 


0.50 


florida bay dry (b) 


128 


2137 


125 


0.26 


0.32 


Cyprus dry (b) 


71 


640 


68 


0.26 


0.43 


maspalomas (b) 


24 


82 


21 


0.30 


0.30 


rhode river (b) 


20 


53 


17 


0.28 


0.42 


train schedule 2011 (c) 


19359 


77201 


18907 


0.0004 


0.00 


flight schedule 2010 (d) 


37800 


1324556 


32905 


0.0019 


0.00 



Table 1. Characteristics of our real-world test instances. 



Question 3: Can we solve real-world instances by the lexmax strategy? We consider 
real-world instances from different domains. 

a): Ordered binary decision diagrams (OBDDs): In such networks the outdegree is 
two, that is constant. This immediately implies that the corresponding sequences 
are opposed sequences, and hence can provably be solved by the lexmax strategy. 

b): Food Webs: Such networks are almost hierarchical and therefore have a strong 
tendency to be acyclic ("larger animals cat smaller animals"). In our experiments 
we analyzed food webs from the Pajek network library |Bat04| . 

c): Train timetable network: We use timetable data of German Railways from 2011 
and form a time-expanded network. Its vertices correspond to departure and ar- 
rival events of trains, a departure vertex is connected by an arc with the arrival 
event corresponding to the very next train stop. Moreover, arrival and departure 
events at the same station are connected whenever a transfer between trains is 
possible or if the two events correspond to the very same train. 

d): Flight timetable network: We use the European flight schedule of 2010 and form 
a time-expanded network as in c) . 

The characteristics of our real- world networks b) - d) are summarized in Table IT] 
The dag density p of a network is defined as p — m/ (™) . To compare the distance 
to opposed for instances of different sizes, we normalize this value by the theoretical 
maximum ( 2 ), where b denotes the number of stream tuples, and so obtain a normal- 
ized distance to opposed. Without any exception, all real-world instances have been 
realized by the lexmax strategy. 

Back to theory. Inspired by our observations in the systematic experiments, we 
reconsidered forest sequences. We can show that an arbitrary choice of a tuple in 
Vlnin in line 3 of Algorithm n] solves the problem for forest sequences. 

Theorem 2 (Realization of forest dags). Let S := f? 1 ), . . . , (?") with X)"=i a i < 
n — 1 be a canonically sorted sequence containing k > source tuples. Furthermore, 
we assume that S is not a source-sink-sequence. Consider an arbitrary stream tuple 
(V) with ai < k. S is a dag sequence if and only if 



S': = 







v 6i - 1 , 

is a dag sequence. 







K 





ba.,+1 



at-i 
bi-i 



bi+i 
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Note, that sequence S' may contain zero tuples. In this case, we delete these tuples 
and renumber the tuples from this new sequence S' := (i, 1 ), ■ ■ • , (t"') from 1 to n' . 

Clearly, we have Y17=i a 'i — n—ai — 1 < n' — l, because we deleted exactly the indegree 
of tuple (?*) in S and it is only possible to delete at most aj new zero tuples in S' . 
Hence, Theorem [2] results in a recursive algorithm. At each step, one has to choose an 
arbitrary stream tuple (?M with indegree of at most k and then to reduce Oj largest 
sources by one and to set the indegree otj of this tuple to zero. On the other hand, the 
set V m i n of Theorem [l] is a subset of the allowed tuple set in Theorem [2j Hence, we 
get the following corollary. 

Corollary 2 (arbitrary tuple choice in V m i n for forest sequences). Let S := 
(i 1 ), • ■ ■ , (?") with y]™—i a-i < n — 1 be a canonically sorted sequence containing k > 
source tuples. Furthermore, let S' be defined as in Theorem \l\ where Ct H ) is an 

arbitrary tuple in V m in- 

S is a dag sequence if and only if S' is a dag sequence. 

3 Randomized Algorithms 

3.1 Four versions of randomized algorithms 

The main idea for developing a randomized algorithm is the following. In each trial 
use a randomly chosen topological sorting (a random permutation of the tuples) for 
a given sequence and then apply the linear-time realization algorithm as described 
in Section [TJ and justified by Lemma IT] Clearly, it is not necessary to permute all 
tuples in a sequence. Instead we use a canonically sorted sequence and permute only 
the stream tuples. We denote this first naive version of a randomized algorithm by 
stream tuple permutation algorithm (Rand I) . A random permutation of a sequence of 
length n can be chosen in 0(n) time, see for example [Dur64 . Hence, one trial of the 
stream tuple permutation algorithm requires 0(m + n) time. This algorithm performs 
poorly since there are sequences with only a single realization among (n — 2)! many 
permutations of n — 2 stream tuples. On the other hand, it is possible to restrict the 
number of possible topological sortings by the following lemma. 

Lemma 2 (necessary criterion for the realizability of dag sequences). Let S 

be a dag sequence. Denote the number of source tuples in S by q and the number of 
sink tuples by s. Then it follows ai < min{n — s, i — 1} and bi < min{n — q, n — i} for 
all i S N n for each labeling of S corresponding to a topological order. 

Hence, a stream tuple (^ s ) can only be at position j in a topological ordering if 
a j < min{n — s,i — 1} and bj < min{n — q,n — i} is fulfilled. We define a bipartite 
bounding graph Bs = (VsUWs, Es) for a given canonically sorted sequence as follows. 
We define \S\ — q — s vertices Vi £ Vs with i £ {q + 1, . . . ,n — s} where each vertex 
Vi corresponds to an "upper bound tuple" ( mm t n - s ' 1 - i\ f or a stream tuple in S. 
Furthermore, we define \S\ — q — s vertices Wi with i £ {q + 1, .. . ,n — s} each 
corresponding to a stream tuple (?*). The edge set Es is built as follows. Two vertices 
Vi and Wj are adjacent if and only if we find for \V] that aj < min{n — s, i — 1} and 
bj < min{n — q,n — i}. We show an example of the bounding graph (Figure HI. 
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V S W S 




Fig. 4. Bounding graph G s for sequence S ■- (°), (°), (*), Q, (*), ({), Q, (*), Q. 
One perfect matching (thick red edges) leads to the topological order ( 2 ) , ( 3 ) , ( 4 ) , (J 
which is realizable, whereas another perfect matching (thick blue edges) gives the 
topological order (J) , Q) , (^) , (3) which is not realizable. 



A perfect matching in this bounding graph gives us a possible topological sorting 
with respect to Lemma 2 This means, we assign to each stream tuple (^ J ) in 5* the 
number i if and only if (Vi,Wj) is a matching edge in the chosen perfect matching. 
Clearly, there does not exist a dag realization of sequence S if Bg does not contain a 
perfect matching. Unfortunately, the computation of the number of perfect matchings 
in a bipartite graph is known to be (jP-hard |Val79| . On the other hand, there exists 
a polynomial-time algorithm for the problem of uniform sampling a perfect matching 
within a bipartite graph by Jerrum, Sinclair and Vigoda [ JSV04J . They use a Markov 
chain based algorithm. The number of necessary steps in this algorithm is measured 
by the so-called mixing time r e , where e denotes the variation distance to the uniform 
distribution. They proved a worst case mixing time of 0(n 8 (n\ogn + log -)log -). 
Up to know, we do not know if we really need a uniform distribution, but we do not 
want to eliminate certain topological orderings. Our second version of a randomized 
algorithm - the bounding permutation algorithm (Rand II) - chooses in each trial a 
topological sorting by uniform sampling a perfect matching in Bg and then applies the 
realization algorithm for a given topological order (Lemma nl) . For our experiments 
with very small instances, we sampled uniformly by enumerating all permutations of 
stream tuples. 

Our third randomized algorithm - the opposed permutation algorithm (Rand III) 
- exploits the non-trivial result in Corollary [T] about opposed topological sortings. It 
uses for one trial, Algorithm [l] with a change in line 3. We replace line 3 by: "Sample 
a Vj € Vmin uniformly at random." If possible, we restrict the set of U4 in before 
line 3, i.e., we check for the largest Vi € U^ in whether the bounds of Lemma [2] 
are respected for later positions. Let k denote the number of recursive calls up to 
the current one. Expressed in terms of the original sequence, we have to choose the 
(q + fc)-th tuple in the topological sorting in the current iteration. If bi = n — (q + k) 
for the lexicographical largest tuple (£*) g Vmim then we set V min ■— {(I*)}- The 
reason is that a larger position is not possible at all for this tuple, because the upper 
bound for bi decreases strictly, as shown in Lemma [21 At first glance it is not clear 
whether the restriction to a subset of permutations within the randomized algorithm 
really increases the chance to draw a realizable topological sorting. This version of 
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the algorithm only constructs dag realizations which possess an opposed topological 
sorting. Hence, we also exclude possible topological sortings which are not opposed 
topological sortings. However, empirically this idea pays off. 

Our fourth randomized version combines the opposed permutation algorithm with 
several reduction rules which exploit the symmetric roles of in- and outdegrees and 
degree dominance of tuples. The following reduction rules can be used to simplify a 
given sequence. Additional (similar) rules are possible, but we restrict our description 
to those rules which have been implemented and used in our experiments. 

1. Exploit symmetric roles of in- and outdegrees. If |V^j in | = 1, the reduction step in 
Algorithm [I] is safe (for any realizable sequence). Since the problem is symmetric 
with respect to in- and outdegrees, we can exchange their roles. This suggests to 
check the size of V^ in from "both sides". If either of these sets has size one, the 
corresponding reduction step is safe and should be preferably applied. 

2. Degree dominance of some tuple. Suppose that some bi is so large that this number 
matches the number of available stream and sink tuples, then vertex Vi has to be 
connected with all current non-sources. Hence, sequence S can be reduced by 
deleting a source tuple (") or by updating a stream tuple (£*) to a new sink ("*), 
respectively, and by subtracting one from all aj > with i =/= j. The symmetric 
reduction rule can be stated for a dominating a^-value. 

3. Dominating total degree of some stream tuple. Suppose there is a stream tuple 
with aj + bi = n — 1. Then we can conclude that this tuple has to be connected 
with all other tuples. It is unclear which stream tuples come before and which 
after (?*) in some realization. However, we can be sure that it is connected with 
all sources and all sinks (in particular a, < q and bi < s must hold). In order to 
ensure that later recursive reduction steps do not introduce parallel arcs, we only 
apply a more conservative reduction. Namely, we connect the vertex Vi only with 
sources and sinks for which a* = 1 or bi = 1, respectively. 

We additionally apply these rules whenever applicable and call the randomized 
algorithm opposed permutation algorithm with reduction rules (Rand IV). 

3.2 Experimental comparison of randomized algorithms 

Experiment 1: Which randomized algorithm possesses the best success probability for 
one trial? We define the success probability p(m) as the probability that a given 
sequence S := (j 1 ), • • ■ , (?") with m := X)"=i a i can be realized by a specified ran- 
domized algorithm in one single trial. In this experiment we test the four versions 
of our randomized algorithms with all non-trivial sequences (as defined in Section k| 
of 9 tuples, see Figure [5j Moreover, we display the fraction of lexmax sequences to 
compare the deterministic lexmax strategy with our randomized strategy. 

Observations: Randomized version 4 (opposed permutation algorithm with reduction 
rules) clearly outperforms all other strategies. We also observe that the success prob- 
ability p depends on the density m of the dag realizations. Sparse and dense dags 
have the best success probability. The deterministic lexmax strategy has almost the 
same success probability as our best randomized version. Of course, we can repeat 
a randomized algorithm and thereby boost the success rate which is not possible for 
the deterministic variant. Nevertheless the good performance of the simple lexmax 
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strategy is quite remarkable, it clearly outperforms an arbitrary strategy to choose in 
line 3 of Algorithm n] an element from V^ lin (realized in randomized version 3) . 

Experiment 2: We consider the success probability for all randomized algorithms in the 
case of non-lexmax sequences which are not reducible by our reduction rules. Noting 
that an impressively large fraction of sequences is efficiently solvable by the deter- 
ministic lexmax strategy combined with our reduction rules, we should ask: How 
well do our randomized algorithms perform for the remaining difficult cases, that is 
for non-reducible non-lexmax sequences! Actually, this is indeed the most interesting 
question, because the best approach for realizing a given sequence S would be: first 
to test, whether S is a reducible lexmax sequence. Only if this is not the case, one 
would take a randomized algorithm. Hence, we now determine the success probability 
p(m) for all non-reducible non-lexmax sequences, see Figure p] 

Observations: As in the previous experiment, randomized version 4 has the overall best 
success probability p, but in sharp contrast we observe a completely different depen- 
dence on m. One possible explanation could be that for high densities our reduction 
rules have been applied more often. Note that the overall percentage of non-reducible 
non-lexmax sequences in the set of all non-trivial sequences with 9 tuples is so tiny 
(see the brown curve in Figure p| — in particular for low densities — that we can 
realize after two or three trials almost all sequences. 



4 Conclusion 



In this paper we have studied the performance of a simple linear-time heuristic to 
solve the NP-complete dag realization problem and several randomized variants. We 
give a brief summary of our main observations. 
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1. Dag sequences S with sparse or dense densities are almost always lexmax se- 
quences. 

2. Dag sequences with a small distance to opposed d(S) are to a large extent lexmax 
sequences. 

3. There is a good chance to realize a dag sequence by the lexmax strategy, especially 
for acyclic real- world networks. 

For a given (real-world) sequence we propose the following recipe: Choose Algo- 
rithm [T] with lexmax strategy and apply the reduction rules 1-3. If this run is unsuc- 
cessful apply version 4 of our randomized algorithms, i.e. the opposed permutation 
algorithm with reduction rules. For most dag sequences in practice this will give us a 
pretty fair chance to find a realization. The surprisingly broad success of the lexmax 
strategy suggests that there might be further subclasses of instances where it runs 
provably correct. In future work we would like to characterize the class of instances 
for which the lexmax strategy works provably correct. 
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Appendix 



A Proofs 



In this section we present the proofs for our theoretical results. For the first lemma. 
recall the corresponding setting from Section [T] 

Let S := (I 1 ), • • ■ , (b' +1 )> • ■ ■ i {hj be an arbitrary sequence with a given topolog- 
ical order. Without loss of generality, we may assume that the source tuples come 
first in the prescribed numbering and are ordered decreasingly with respect to their bi 
values. Let S' := (^J , . . . , ( °». +il ) , . . . , (£) , ( ° ) , . . . , (£) . If we get zero tuples 

in S', then we delete them and denote the new sequence for simplicity also by S' . 
Furthermore, we label this sequence with a new numbering starting from one to its 
length and consider this sorting as the given topological ordering of 5". 

Lemma nl S is a dag sequence for a given topological order ■& S' is a dag sequence 
for its corresponding topological order. 

Proof. <=: Trivial. 

=>■: We consider a dag realization for the given topological ordering of dag sequence 
S. Clearly, we find at least a q+ i sources. This is true, because a first vertex with non- 
empty incoming neighborhood set in a topological sorting (a sink or a stream vertex) 
of a dag can only possess sources in its incoming neighborhood set. Otherwise, this 
numbering is not a topological sorting. Assume, there is no dag realization for the 
given topological order, such that the a q +\ largest sources are connected with vertex 
v q+ \. In this case, we consider a dag realization G to this topological order such 
that the maximum possible number of largest sources is connected with vertex v q+ \. 
Then we have two sources m and Vj with (vi,v q +i) £ A, {yj,v q +\ G A), bi > bj 
and i,j < q + 1. Since bi > and v q+ \ is the first non-source tuple, there is a 
non-source vertex Vk (k > q + 1) with (i>i,Vk) G A and (vj,Vk) *£ A. We define a 
new digraph G* := (G \ {v u v k } U {vj,v q+ i}) U ({vt, v q+ i} U {vj,v k })- Obviously, G* 
is a dag realization for the given topological order of sequence S. Contradiction to 
the assumption that G is a dag realization with the maximum possible number of 
largest sources for vertex v q +i- Hence, there exists a dag realization G to the given 
topological order such that vertex w g +i has in its incoming neighborhood set only 
the a q +i largest sources from the set of all sources Vi with i < q + 1. We delete the 
incoming neighborhood set of vertex v q+ i and yield a dag realization for sequence 5" 
for its given topological ordering. □ 

The existence of a simple solution for forest sequences is not so surprising as there 
is a simple approach to construct a dag realization if there is one. First, one can apply 
a digraph realization algorithm. When we do not find a digraph realization, there 
also cannot be a dag realization. Assume, we have a digraph realization G = (V,A). 
If G possesses no directed cycle then it is a dag realization and we are ready. Let 
us assume, G has at least one directed cycle. In this case, there exist at least two 
weak components (i.e. connected components in the underlying undirected graph), 
because the underlying undirected graph is not a forest. Hence, we can choose an 
arc (^1,^2) of the directed cycle in the first component and a further arc (v3,Vi) 
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from the second weak component. We construct the new digraph G' := (V, A') with 
A' := A\{(v\,V2), (v3,V4)}U{(vi,i>4), {v 3 ,v 2 )}- We apply a sequence of such steps (at 
most n steps) until we get an acyclic dag realization. This is possible because as long 
as we can find a directed cycle we also have more than one weak component. (Note, 
that the underlying graph is not necessarily a simple graph. It can contain parallel 
edges corresponding to directed 2-cycles of the initial dag realization G.) Hence, we 
can conclude that each forest sequence which is a digraph sequence is also a dag se- 
quence. Clearly, this can be decided in polynomial time. 

Note that sequence S' may contain zero tuples. In this case, we delete these tuples 
and renumber the tuples from this new sequence S' := (2/ 1 ), ■ ■ ■ , Q/*') from 1 to n' . 

Clearly, we have Y17=i a i — n—ai — 1 < n' — 1, because we deleted exactly the indegree 
of tuple (?') in S and it is only possible to delete at most Oj new zero tuples in S' . 
Hence, Theorem [2] results in a recursive algorithm. At each step, one has to choose an 
arbitrary stream tuple (?M with indegree of at most k and then to reduce a, largest 
sources by one and to set the indegree a« of this tuple to zero. On the other hand, the 
set Vmin of Theorem [l] is a subset of the allowed tuple set in Theorem [2j Hence, we 
get Corollary [2] 

Theorem [2] (Realization of forest dags). Let S := (£), . . . , (£") with YZ=i a * - 
n — 1 be a canonically sorted sequence containing k > source tuples. Furthermore, 
we assume that S is not a source-sink-sequence. Consider an arbitrary stream tuple 
(?*) with at < k. 
S is a dag sequence if and only if 

„i (0^ ( \ ( \ ( ® \ ( a i-t \ ( 0\ ( a i+i ^ I a ? 



bi — ll \b ai — l) \b ai+1 J yfe fc y \ & i ; ~i/ \bi) \bi+ij ' \b n 

is a dag sequence. 

Proof, (of Realization of forest dags) <^=: Trivial. 

=£■: Let S be a dag sequence with k > 1 source tuples. We consider a dag real- 
ization G such that we have a minimum number of weak components. Clearly, the 
underlying undirected graph is then a forest without undirected cycles. Furthermore, 
we consider a dag realization G as described where the incoming neighborhood set 
of vertex Vi consists of a maximum possible number of sources. Assume, there is a 
vertex v t - £ Ng(vi) which is not a source. (The notation Nq(v) describes the in- 
neighborhood of vertex v in G.) Then we can conclude that there exists a source q 
and a vertex Vj with (q, Vj) £ A but (q, Vi) <£ A, because we have dg(ui) = a, < k by 
our assumption. We distinguish between two cases. 

case 1: There exists no underlying undirected path between vertices Vj and Vi. 
case 2: There exists exactly one underlying undirected path P between vertices Vj 
and Vi. 

Note that there cannot be more than one underlying undirected path, because the 
underlying graph of G is by our assumption a forest. Let us start with case 1. There 
cannot be an underlying undirected path between vertices q and Vi, otherwise we 
would find the excluded undirected path between Vi and Vj, because q is adjacent to Vj. 
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We construct the dag G = {V,A') with A' := A\{(q,Vj),(Vi-,Vi)}U{(q,Vi),(vi-,Vj)}. 
Consider Figure [7] Digraph G is indeed a dag, because G does not contain an underly- 
ing undirected path between Vi and q and not between i> 4 - and Vj by our assumptions. 
Hence, we did not construct an underlying undirected cycle and clearly no directed 
cycle. 



v i 

"► not in G p- in G 



Fig. 7. Case 1 : no underlying undirected path between q and Vi in G. 



But then G' is a dag realization with a minimum number of weak components and 
a larger number of sources in the neighborhood set of Vi than in dag G. Contradiction! 
It remains to consider case 2. Since vertex Vi is a stream tuple we define the following 
dagG" = (V, A') with A' := A\{(q,Vj), («i-,Vj), (vi,v i+ )}{J{(q,Vi), (Vi-,Vi+), (u»,Vj)} 
as can be seen in Figure [8] 

Note, that Vj and Vi- are not necessarily distinct vertices. If this is the case case, 
then we replace in A vertex t>$_ by Va. Since we destroyed by our construction all 
underlying unique paths in G between Vi and q, between Vi+ and Vj- and between Vj 
and Vi, digraph G' is indeed acyclic and possesses a minimum number of weak com- 
ponents. On the other hand vertex u, is connected with a larger number of sources as 
in G. Contradiction! 

Hence, we can assume that there exists a dag realization G = (V,A) with a 
minimum number of weak components such that the incoming neighborhood set of 
vertex Vi only contains sources. We consider a dag realization such that vertex Vi is 
connected with the maximum possible number of largest sources. Assume, there is a 
source q' > q such that (q, vi) £ A and (</, Vi) $. A. Then there exists a further vertex 
Vj with (q',Vj) £ A. We distinguish again between two cases. If there does not exist 
an underlying undirected path P between q and Vj or between q' and Uj, we define 
the new dag G' := (V,A') with A := A \ {(q,Vi),(q',Vj)} U {(q , ,v i ),(q,Vj)} with a 
minimum number of weak components but with one larger source connected with u, 
than in G (see Figure [9]). Contradiction! 
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Fig. 8. Case 2 : one unique underlying undirected path P between q and vt 



,q' 




■■► not in G 
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underlying 
undirected 
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Fig. 9. A larger source q' is not connected with vertex Vi 



Hence, we next assume that there is one underlying undirected path P = q', v .',•••> ' 
between q' and Vi. (A further path between q and Vj cannot exist, because in this case 
we would find an underlying cycle.) Note, that it is possible that we find ty = Vj. In 
this case we replace in the following steps ty by Vj. We define the new dag realization 
G' = (V, A') with A' := A\{(q,Vi), (<f,Vji)}\J{(q',Vi), (q, ?y )} with a minimum num- 
ber of weak components, because we destroyed by our construction the underlying 
unique paths from q to v ■> and from q' to w,. Dag G" possesses one more of the largest 
sources connected to Vi than G. Contradiction! As a last case it remains, that there 
could exist an underlying undirected path P = q, . . . ,Vj from q to Vj, see Figure 10 



Since q' is a larger source than q, there exists a further vertex v •' with (g', v •' ) G A. 
Then we construct the dag realization G' — (V, A') with A' := A \ {(q, Vi), (q' , «„■')} U 
{(q',Vi), (q, Vj')}- Indeed, we destroyed in G the unique paths between q and iy and 
between q' and Wj. Hence, G' is a dag with a minimum number of weak components 
but with one more of the largest sources connected to Vi than in G. Contradiction! 
So, there exists a dag realization G such that vertex m has in its incoming neighbor- 
hood set only largest sources. We delete the arcs from these sources to Vi in G, and 
get a dag realization G' with dag sequence S' . □ 
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► not in G 

-> in G 

underlying 
undirected 
path P 



Fig. 10. A larger source q' is not connected with vertex Vi and there exists an 
underlying path P between q to Vj . 



Lemma [2] (necessary criterion for the realizability of dag sequences). Let 

S be a dag sequence. Denote the number of source tuples in S by q and the number 
of sink tuples by s. Then it follows aj < min{n — s, i — 1} and b t < min{n — q, n — i} 
for all i £ N„ for each labeling of S corresponding to a topological order. 

Proof. Let S := (f x ) , . . . , (?") be a labeling of S corresponding to a topological sorting 
of a dag realization G. Assume, there is a j E N„ with aj > min{?i — s,j — 1}. 
(Case bj > min{n — q,n — j} can be done analogously.) G is a subdigraph of a 
complete dag G* with topological sorting v\, . . . , v n . Clearly, we have d^, (vj) = j — 1. 
We distinguish between two cases. If we have min{n — s,j — 1} = n — s, then it 
follows ftj = d^{vj) > n — s. Then the incoming neighborhood set Nq(vj) consists 
of more than n — s vertices - in contradiction to the fact that Nq(vj) contains at 
most n — s vertices. Let us now assume min{n — s,j — 1} = j — 1. Then we get 
^G* ( v 3 ) = ~ 1 < a j = d-Q (vj ) - a contradiction to our assumption that G is a 
subdigraph of G* . □ 



B Example for Algorithm [T] 

Example^ Consider the sequence 5 =(°), g), Q), Q, Q, ({), Q, Q, Q. Figure [Tl] 
shows the recursion tree of Algorithm [T] for this instance. The symbol x here denotes 
tuples of S which have been deleted after being reduced to ( ) . The rightmost path 
(green) corresponds to the lexmax strategy, not leading to a realization. 
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Fig. 11. Recursion tree for Example [T] The symbol x here denotes tuples of S which 
have been deleted after being reduced to (?) . The forth tree level where the original 
sequence is reduced to a source-sink sequence is marked with red boxes. 
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Table 2. Systematic experiments with n = 9. For m € {9, . . . , 35}, we show the 
number of non-trivial sequences (i.e. sequences which have at least one stream tuple, 
column 3), the number of sequences where the pure lexmax strategy fails (column 4), 
and finally the number of sequences where the lexmax strategy combined with our 
reduction rules fails (column 5). 



